Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
1.
bioRxiv ; 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38585727

RESUMO

Analyzing taxonomic diversity and identification in diverse ecological samples has become a crucial routine in various research and industrial fields. While DNA barcoding marker-gene approaches were once prevalent, the decreasing costs of next-generation sequencing have made metagenomic shotgun sequencing more popular and feasible. In contrast to DNA-barcoding, metagenomic shotgun sequencing offers possibilities for in-depth characterization of structural and functional diversity. However, analysis of such data is still considered a hurdle due to absence of taxa-specific databases. Here we present taxonize-gb, a command-line software tool to extract GenBank non-redundant nucleotide and protein databases, related to one or more input taxonomy identifier. Our tool allows the creation of taxa-specific reference databases tailored to specific research questions, which reduces search times and therefore represents a practical solution for researchers analyzing large metagenomic data on regular basis. Taxonize-gb is an open-source command-line Python-based tool freely available for installation at https://pypi.org/project/taxonize-gb/ and on GitHub https://github.com/msabrysarhan/taxonize_genbank. It is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

2.
Am J Hum Genet ; 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38636510

RESUMO

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.

3.
Hum Mol Genet ; 2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38643062

RESUMO

Genotype imputation is widely used in genome-wide association studies (GWAS). However, both the genotyping chips and imputation reference panels are dependent on next-generation sequencing (NGS). Due to the nature of NGS, some regions of the genome are inaccessible to sequencing. To date, there has been no complete evaluation of these regions and their impact on the identification of associations in GWAS remains unclear. In this study, we systematically assess the extent to which variants in inaccessible regions are underrepresented on genotyping chips and imputation reference panels, in GWAS results and in variant databases. We also determine the proportion of genes located in inaccessible regions and compare the results across variant masks defined by the 1000 Genomes Project and the TOPMed program. Overall, fewer variants were observed in inaccessible regions in all categories analyzed. Depending on the mask used and normalized for region size, only 4%-17% of the genotyped variants are located in inaccessible regions and 52 to 581 genes were almost completely inaccessible. From the Cooperative Health Research in South Tyrol (CHRIS) study, we present a case study of an association located in an inaccessible region that is driven by genotyped variants and cannot be reproduced by imputation in GRCh37. We conclude that genotyping, NGS, genotype imputation and downstream analyses such as GWAS and fine mapping are systematically biased in inaccessible regions, due to missed variants and spurious associations. To help researchers assess gene and variant accessibility, we provide an online application (https://gab.gm.eurac.edu).

4.
Chem Senses ; 492024 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-38452143

RESUMO

The sense of smell allows for the assessment of the chemical composition of volatiles in our environment. Different factors are associated with reduced olfactory function, including age, sex, as well as health and lifestyle conditions. However, most studies that aimed at identifying the variables that drive olfactory function in the population suffered from methodological weaknesses in study designs and participant selection, such as the inclusion of convenience sample or only of certain age groups, or recruitment biases. We aimed to overcome these issues by investigating the Cooperative Health Research in South Tyrol (CHRIS) cohort, a population-based cohort, by using a validated odor identification test. Specifically, we hypothesized that a series of medical, demographic and lifestyle variables is associated with odor identification abilities. In addition, our goal was to provide clinicians and researchers with normative values for the Sniffin' Sticks identification set, after exclusion of individuals with impaired nasal patency. We included 6,944 participants without acute nasal obstruction and assessed several biological, social, and medical parameters. A basic model determined that age, sex, years of education, and smoking status together explained roughly 13% of the total variance in the data. We further observed that variables related to medical (positive screening for cognitive impairment and for Parkinson's disease, history of skull fracture, stage 2 hypertension) and lifestyle (alcohol abstinence) conditions had a negative effect on odor identification scores. Finally, we provide clinicians with normative values for both versions of the Sniffin' Sticks odor identification test, i.e. with 16 items and with 12 items.


Assuntos
Disfunção Cognitiva , Transtornos do Olfato , Doença de Parkinson , Adulto , Humanos , Transtornos do Olfato/diagnóstico , Transtornos do Olfato/epidemiologia , Olfato , Odorantes , Limiar Sensorial
5.
NAR Genom Bioinform ; 6(1): lqae015, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38327871

RESUMO

Genome-wide association studies (GWAS) are transforming genetic research and enable the detection of novel genotype-phenotype relationships. In the last two decades, over 60 000 genetic associations across thousands of traits have been discovered using a GWAS approach. Due to increasing sample sizes, researchers are increasingly faced with computational challenges. A reproducible, modular and extensible pipeline with a focus on parallelization is essential to simplify data analysis and to allow researchers to devote their time to other essential tasks. Here we present nf-gwas, a Nextflow pipeline to run biobank-scale GWAS analysis. The pipeline automatically performs numerous pre- and post-processing steps, integrates regression modeling from the REGENIE package and supports single-variant, gene-based and interaction testing. It includes an extensive reporting functionality that allows to inspect thousands of phenotypes and navigate interactive Manhattan plots directly in the web browser. The pipeline is tested using the unit-style testing framework nf-test, a crucial requirement in clinical and pharmaceutical settings. Furthermore, we validated the pipeline against published GWAS datasets and benchmarked the pipeline on high-performance computing and cloud infrastructures to provide cost estimations to end users. nf-gwas is a highly parallelized, scalable and well-tested Nextflow pipeline to perform GWAS analysis in a reproducible manner.

6.
Nat Commun ; 15(1): 888, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38291025

RESUMO

To date only a fraction of the genetic footprint of thyroid function has been clarified. We report a genome-wide association study meta-analysis of thyroid function in up to 271,040 individuals of European ancestry, including reference range thyrotropin (TSH), free thyroxine (FT4), free and total triiodothyronine (T3), proxies for metabolism (T3/FT4 ratio) as well as dichotomized high and low TSH levels. We revealed 259 independent significant associations for TSH (61% novel), 85 for FT4 (67% novel), and 62 novel signals for the T3 related traits. The loci explained 14.1%, 6.0%, 9.5% and 1.1% of the total variation in TSH, FT4, total T3 and free T3 concentrations, respectively. Genetic correlations indicate that TSH associated loci reflect the thyroid function determined by free T3, whereas the FT4 associations represent the thyroid hormone metabolism. Polygenic risk score and Mendelian randomization analyses showed the effects of genetically determined variation in thyroid function on various clinical outcomes, including cardiovascular risk factors and diseases, autoimmune diseases, and cancer. In conclusion, our results improve the understanding of thyroid hormone physiology and highlight the pleiotropic effects of thyroid function on various diseases.


Assuntos
Glândula Tireoide , Tiroxina , Humanos , Glândula Tireoide/metabolismo , Tiroxina/metabolismo , Estudo de Associação Genômica Ampla , Tri-Iodotironina/metabolismo , Tireotropina/metabolismo
7.
Sci Rep ; 14(1): 2083, 2024 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-38267512

RESUMO

Mitochondrial DNA copy number (mtDNA-CN) is a biomarker for mitochondrial dysfunction associated with several diseases. Previous genome-wide association studies (GWAS) have been performed to unravel underlying mechanisms of mtDNA-CN regulation. However, the identified gene regions explain only a small fraction of mtDNA-CN variability. Most of this data has been estimated from microarrays based on various pipelines. In the present study we aimed to (1) identify genetic loci for qPCR-measured mtDNA-CN from three studies (16,130 participants) using GWAS, (2) identify potential systematic differences between our qPCR derived mtDNA-CN measurements compared to the published microarray intensity-based estimates, and (3) disentangle the nuclear from mitochondrial regulation of the mtDNA-CN phenotype. We identified two genome-wide significant autosomal loci associated with qPCR-measured mtDNA-CN: at HBS1L (rs4895440, p = 3.39 × 10-13) and GSDMA (rs56030650, p = 4.85 × 10-08) genes. Moreover, 113/115 of the previously published SNPs identified by microarray-based analyses were significantly equivalent with our findings. In our study, the mitochondrial genome itself contributed only marginally to mtDNA-CN regulation as we only detected a single rare mitochondrial variant associated with mtDNA-CN. Furthermore, we incorporated mitochondrial haplogroups into our analyses to explore their potential impact on mtDNA-CN. However, our findings indicate that they do not exert any significant influence on our results.


Assuntos
Variações do Número de Cópias de DNA , DNA Mitocondrial , Humanos , DNA Mitocondrial/genética , Variações do Número de Cópias de DNA/genética , Estudo de Associação Genômica Ampla , Mitocôndrias/genética , Loci Gênicos , Gasderminas
8.
Cell Rep ; 43(1): 113611, 2024 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-38159276

RESUMO

Complement is a fundamental innate immune response component. Its alterations are associated with severe systemic diseases. To illuminate the complement's genetic underpinnings, we conduct genome-wide association studies of the functional activity of the classical (CP), lectin (LP), and alternative (AP) complement pathways in the Cooperative Health Research in South Tyrol study (n = 4,990). We identify seven loci, encompassing 13 independent, pathway-specific variants located in or near complement genes (CFHR4, C7, C2, MBL2) and non-complement genes (PDE3A, TNXB, ABO), explaining up to 74% of complement pathways' genetic heritability and implicating long-range haplotypes associated with LP at MBL2. Two-sample Mendelian randomization analyses, supported by transcriptome- and proteome-wide colocalization, confirm known causal pathways, establish within-complement feedback loops, and implicate causality of ABO on LP and of CFHR2 and C7 on AP. LP causally influences collectin-11 and KAAG1 levels and the risk of mouth ulcers. These results build a comprehensive resource to investigate the role of complement in human health.


Assuntos
Estudo de Associação Genômica Ampla , Lectina de Ligação a Manose , Humanos , Ativação do Complemento , Proteínas do Sistema Complemento/metabolismo , Lectinas/metabolismo , Haplótipos/genética , Lectina de Ligação a Manose/genética
9.
Sci Rep ; 13(1): 18904, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37919319

RESUMO

The oral microbiota plays an important role in the exogenous nitrate reduction pathway and is associated with heart and periodontal disease and cigarette smoking. We describe smoking-related changes in oral microbiota composition and resulting potential metabolic pathway changes that may explain smoking-related changes in disease risk. We analyzed health information and salivary microbiota composition among 1601 Cooperative Health Research in South Tyrol participants collected 2017-2018. Salivary microbiota taxa were assigned from amplicon sequences of the 16S-V4 rRNA and used to describe microbiota composition and predict metabolic pathways. Aerobic taxa relative abundance decreased with daily smoking intensity and increased with years since cessation, as did inferred nitrate reduction. Former smokers tended to be more similar to Never smokers than to Current smokers, especially those who had quit for longer than 5 years. Cigarette smoking has a consistent, generalizable association on oral microbiota composition and predicted metabolic pathways, some of which associate in a dose-dependent fashion. Smokers who quit for longer than 5 years tend to have salivary microbiota profiles comparable to never smokers.


Assuntos
Fumar Cigarros , Microbiota , Humanos , Estudos Transversais , Nitratos , Microbiota/genética , Fumantes , RNA Ribossômico 16S/genética
10.
Nat Protoc ; 18(9): 2625-2641, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37495751

RESUMO

The human leukocyte antigen (HLA) locus is associated with more complex diseases than any other locus in the human genome. In many diseases, HLA explains more heritability than all other known loci combined. In silico HLA imputation methods enable rapid and accurate estimation of HLA alleles in the millions of individuals that are already genotyped on microarrays. HLA imputation has been used to define causal variation in autoimmune diseases, such as type I diabetes, and in human immunodeficiency virus infection control. However, there are few guidelines on performing HLA imputation, association testing, and fine mapping. Here, we present a comprehensive tutorial to impute HLA alleles from genotype data. We provide detailed guidance on performing standard quality control measures for input genotyping data and describe options to impute HLA alleles and amino acids either locally or using the web-based Michigan Imputation Server, which hosts a multi-ancestry HLA imputation reference panel. We also offer best practice recommendations to conduct association tests to define the alleles, amino acids, and haplotypes that affect human traits. Along with the pipeline, we provide a step-by-step online guide with scripts and available software ( https://github.com/immunogenomics/HLA_analyses_tutorial ). This tutorial will be broadly applicable to large-scale genotyping data and will contribute to defining the role of HLA in human diseases across global populations.


Assuntos
Antígenos HLA , Antígenos de Histocompatibilidade Classe I , Humanos , Alelos , Antígenos HLA/genética , Genótipo , Haplótipos , Aminoácidos/genética , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla
11.
Nat Commun ; 14(1): 3377, 2023 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-37291107

RESUMO

The benefits of large-scale genetic studies for healthcare of the populations studied are well documented, but these genetic studies have traditionally ignored people from some parts of the world, such as South Asia. Here we describe whole genome sequence (WGS) data from 4806 individuals recruited from the healthcare delivery systems of Pakistan, India and Bangladesh, combined with WGS from 927 individuals from isolated South Asian populations. We characterize population structure in South Asia and describe a genotyping array (SARGAM) and imputation reference panel that are optimized for South Asian genomes. We find evidence for high rates of reproductive isolation, endogamy and consanguinity that vary across the subcontinent and that lead to levels of rare homozygotes that reach 100 times that seen in outbred populations. Founder effects increase the power to associate functional variants with disease processes and make South Asia a uniquely powerful place for population-scale genetic studies.


Assuntos
Povo Asiático , Efeito Fundador , Humanos , Povo Asiático/genética , Bangladesh , Homozigoto , Índia , Paquistão , População do Sul da Ásia
12.
Arterioscler Thromb Vasc Biol ; 43(7): e254-e269, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37128921

RESUMO

BACKGROUND: Antithrombin, PC (protein C), and PS (protein S) are circulating natural anticoagulant proteins that regulate hemostasis and of which partial deficiencies are causes of venous thromboembolism. Previous genetic association studies involving antithrombin, PC, and PS were limited by modest sample sizes or by being restricted to candidate genes. In the setting of the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium, we meta-analyzed across ancestries the results from 10 genome-wide association studies of plasma levels of antithrombin, PC, PS free, and PS total. METHODS: Study participants were of European and African ancestries, and genotype data were imputed to TOPMed, a dense multiancestry reference panel. Each of the 10 studies conducted a genome-wide association studies for each phenotype and summary results were meta-analyzed, stratified by ancestry. Analysis of antithrombin included 25 243 European ancestry and 2688 African ancestry participants, PC analysis included 16 597 European ancestry and 2688 African ancestry participants, PSF and PST analysis included 4113 and 6409 European ancestry participants. We also conducted transcriptome-wide association analyses and multiphenotype analysis to discover additional associations. Novel genome-wide association studies and transcriptome-wide association analyses findings were validated by in vitro functional experiments. Mendelian randomization was performed to assess the causal relationship between these proteins and cardiovascular outcomes. RESULTS: Genome-wide association studies meta-analyses identified 4 newly associated loci: 3 with antithrombin levels (GCKR, BAZ1B, and HP-TXNL4B) and 1 with PS levels (ORM1-ORM2). transcriptome-wide association analyses identified 3 newly associated genes: 1 with antithrombin level (FCGRT), 1 with PC (GOLM2), and 1 with PS (MYL7). In addition, we replicated 7 independent loci reported in previous studies. Functional experiments provided evidence for the involvement of GCKR, SNX17, and HP genes in antithrombin regulation. CONCLUSIONS: The use of larger sample sizes, diverse populations, and a denser imputation reference panel allowed the detection of 7 novel genomic loci associated with plasma antithrombin, PC, and PS levels.


Assuntos
Proteína C , Proteína S , Proteína C/genética , Proteína S/genética , Estudo de Associação Genômica Ampla , Antitrombinas , Transcriptoma , Anticoagulantes , Antitrombina III/genética , Polimorfismo de Nucleotídeo Único
13.
Nat Commun ; 14(1): 1411, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36918541

RESUMO

The 3-dimensional spatial and 2-dimensional frontal QRS-T angles are measures derived from the vectorcardiogram. They are independent risk predictors for arrhythmia, but the underlying biology is unknown. Using multi-ancestry genome-wide association studies we identify 61 (58 previously unreported) loci for the spatial QRS-T angle (N = 118,780) and 11 for the frontal QRS-T angle (N = 159,715). Seven out of the 61 spatial QRS-T angle loci have not been reported for other electrocardiographic measures. Enrichments are observed in pathways related to cardiac and vascular development, muscle contraction, and hypertrophy. Pairwise genome-wide association studies with classical ECG traits identify shared genetic influences with PR interval and QRS duration. Phenome-wide scanning indicate associations with atrial fibrillation, atrioventricular block and arterial embolism and genetically determined QRS-T angle measures are associated with fascicular and bundle branch block (and also atrioventricular block for the frontal QRS-T angle). We identify potential biology involved in the QRS-T angle and their genetic relationships with cardiovascular traits and diseases, may inform future research and risk prediction.


Assuntos
Bloqueio Atrioventricular , Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/genética , Estudo de Associação Genômica Ampla , Fatores de Risco , Arritmias Cardíacas/genética , Eletrocardiografia/métodos , Biomarcadores
14.
Nat Commun ; 14(1): 1287, 2023 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-36890159

RESUMO

Genome-wide association studies have discovered hundreds of associations between common genotypes and kidney function but cannot comprehensively investigate rare coding variants. Here, we apply a genotype imputation approach to whole exome sequencing data from the UK Biobank to increase sample size from 166,891 to 408,511. We detect 158 rare variants and 105 genes significantly associated with one or more of five kidney function traits, including genes not previously linked to kidney disease in humans. The imputation-powered findings derive support from clinical record-based kidney disease information, such as for a previously unreported splice allele in PKD2, and from functional studies of a previously unreported frameshift allele in CLDN10. This cost-efficient approach boosts statistical power to detect and characterize both known and novel disease susceptibility variants and genes, can be generalized to larger future studies, and generates a comprehensive resource ( https://ckdgen-ukbb.gm.eurac.edu/ ) to direct experimental and clinical studies of kidney disease.


Assuntos
Exoma , Estudo de Associação Genômica Ampla , Humanos , Exoma/genética , Bancos de Espécimes Biológicos , Rim , Reino Unido , Polimorfismo de Nucleotídeo Único
15.
Am J Hum Genet ; 109(11): 1986-1997, 2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36198314

RESUMO

Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Calibragem , Genótipo , Aprendizado de Máquina
16.
Am J Hum Genet ; 109(10): 1727-1741, 2022 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36055244

RESUMO

Transcriptomics data have been integrated with genome-wide association studies (GWASs) to help understand disease/trait molecular mechanisms. The utility of metabolomics, integrated with transcriptomics and disease GWASs, to understand molecular mechanisms for metabolite levels or diseases has not been thoroughly evaluated. We performed probabilistic transcriptome-wide association and locus-level colocalization analyses to integrate transcriptomics results for 49 tissues in 706 individuals from the GTEx project, metabolomics results for 1,391 plasma metabolites in 6,136 Finnish men from the METSIM study, and GWAS results for 2,861 disease traits in 260,405 Finnish individuals from the FinnGen study. We found that genetic variants that regulate metabolite levels were more likely to influence gene expression and disease risk compared to the ones that do not. Integrating transcriptomics with metabolomics results prioritized 397 genes for 521 metabolites, including 496 previously identified gene-metabolite pairs with strong functional connections and suggested 33.3% of such gene-metabolite pairs shared the same causal variants with genetic associations of gene expression. Integrating transcriptomics and metabolomics individually with FinnGen GWAS results identified 1,597 genes for 790 disease traits. Integrating transcriptomics and metabolomics jointly with FinnGen GWAS results helped pinpoint metabolic pathways from genes to diseases. We identified putative causal effects of UGT1A1/UGT1A4 expression on gallbladder disorders through regulating plasma (E,E)-bilirubin levels, of SLC22A5 expression on nasal polyps and plasma carnitine levels through distinct pathways, and of LIPC expression on age-related macular degeneration through glycerophospholipid metabolic pathways. Our study highlights the power of integrating multiple sets of molecular traits and GWAS results to deepen understanding of disease pathophysiology.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Bilirrubina , Carnitina , Glicerofosfolipídeos , Humanos , Masculino , Metabolômica , Locos de Características Quantitativas/genética , Membro 5 da Família 22 de Carreadores de Soluto/genética , Transcriptoma/genética
17.
Am J Hum Genet ; 109(9): 1653-1666, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-35981533

RESUMO

Understanding the genetic basis of human diseases and traits is dependent on the identification and accurate genotyping of genetic variants. Deep whole-genome sequencing (WGS), the gold standard technology for SNP and indel identification and genotyping, remains very expensive for most large studies. Here, we quantify the extent to which array genotyping followed by genotype imputation can approximate WGS in studies of individuals of African, Hispanic/Latino, and European ancestry in the US and of Finnish ancestry in Finland (a population isolate). For each study, we performed genotype imputation by using the genetic variants present on the Illumina Core, OmniExpress, MEGA, and Omni 2.5M arrays with the 1000G, HRC, and TOPMed imputation reference panels. Using the Omni 2.5M array and the TOPMed panel, ≥90% of bi-allelic single-nucleotide variants (SNVs) are well imputed (r2 > 0.8) down to minor-allele frequencies (MAFs) of 0.14% in African, 0.11% in Hispanic/Latino, 0.35% in European, and 0.85% in Finnish ancestries. There was little difference in TOPMed-based imputation quality among the arrays with >700k variants. Individual-level imputation quality varied widely between and within the three US studies. Imputation quality also varied across genomic regions, producing regions where even common (MAF > 5%) variants were consistently not well imputed across ancestries. The extent to which array genotyping and imputation can approximate WGS therefore depends on reference panel, genotype array, sample ancestry, and genomic location. Imputation quality by variant or genomic region can be queried with our new tool, RsqBrowser, now deployed on the Michigan Imputation Server.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Frequência do Gene/genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do Genoma
18.
Metabolites ; 12(7)2022 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-35888728

RESUMO

Metabolites are intermediates or end products of biochemical processes involved in both health and disease. Here, we take advantage of the well-characterized Cooperative Health Research in South Tyrol (CHRIS) study to perform an exome-wide association study (ExWAS) on absolute concentrations of 175 metabolites in 3294 individuals. To increase power, we imputed the identified variants into an additional 2211 genotyped individuals of CHRIS. In the resulting dataset of 5505 individuals, we identified 85 single-variant genetic associations, of which 39 have not been reported previously. Fifteen associations emerged at ten variants with >5-fold enrichment in CHRIS compared to non-Finnish Europeans reported in the gnomAD database. For example, the CHRIS-enriched ETFDH stop gain variant p.Trp286Ter (rs1235904433-hexanoylcarnitine) and the MCCC2 stop lost variant p.Ter564GlnextTer3 (rs751970792-carnitine) have been found in patients with glutaric acidemia type II and 3-methylcrotonylglycinuria, respectively, but the loci have not been associated with the respective metabolites in a genome-wide association study (GWAS) previously. We further identified three gene-trait associations, where multiple rare variants contribute to the signal. These results not only provide further evidence for previously described associations, but also describe novel genes and mechanisms for diseases and disease-related traits.

19.
Kidney Int ; 102(3): 624-639, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35716955

RESUMO

Estimated glomerular filtration rate (eGFR) reflects kidney function. Progressive eGFR-decline can lead to kidney failure, necessitating dialysis or transplantation. Hundreds of loci from genome-wide association studies (GWAS) for eGFR help explain population cross section variability. Since the contribution of these or other loci to eGFR-decline remains largely unknown, we derived GWAS for annual eGFR-decline and meta-analyzed 62 longitudinal studies with eGFR assessed twice over time in all 343,339 individuals and in high-risk groups. We also explored different covariate adjustment. Twelve genome-wide significant independent variants for eGFR-decline unadjusted or adjusted for eGFR-baseline (11 novel, one known for this phenotype), including nine variants robustly associated across models were identified. All loci for eGFR-decline were known for cross-sectional eGFR and thus distinguished a subgroup of eGFR loci. Seven of the nine variants showed variant-by-age interaction on eGFR cross section (further about 350,000 individuals), which linked genetic associations for eGFR-decline with age-dependency of genetic cross-section associations. Clinically important were two to four-fold greater genetic effects on eGFR-decline in high-risk subgroups. Five variants associated also with chronic kidney disease progression mapped to genes with functional in-silico evidence (UMOD, SPATA7, GALNTL5, TPPP). An unfavorable versus favorable nine-variant genetic profile showed increased risk odds ratios of 1.35 for kidney failure (95% confidence intervals 1.03-1.77) and 1.27 for acute kidney injury (95% confidence intervals 1.08-1.50) in over 2000 cases each, with matched controls). Thus, we provide a large data resource, genetic loci, and prioritized genes for kidney function decline, which help inform drug development pipelines revealing important insights into the age-dependency of kidney function genetics.


Assuntos
N-Acetilgalactosaminiltransferases , Insuficiência Renal Crônica , Insuficiência Renal , Estudos Transversais , Loci Gênicos , Estudo de Associação Genômica Ampla , Taxa de Filtração Glomerular/genética , Humanos , Rim , Estudos Longitudinais , N-Acetilgalactosaminiltransferases/genética , Insuficiência Renal/genética
20.
Am J Hum Genet ; 109(6): 1007-1015, 2022 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-35508176

RESUMO

Genotype imputation is an integral tool in genome-wide association studies, in which it facilitates meta-analysis, increases power, and enables fine-mapping. With the increasing availability of whole-genome-sequence datasets, investigators have access to a multitude of reference-panel choices for genotype imputation. In principle, combining all sequenced whole genomes into a single large panel would provide the best imputation performance, but this is often cumbersome or impossible due to privacy restrictions. Here, we describe meta-imputation, a method that allows imputation results generated using different reference panels to be combined into a consensus imputed dataset. Our meta-imputation method requires small changes to the output of existing imputation tools to produce necessary inputs, which are then combined using dynamically estimated weights that are tailored to each individual and genome segment. In the scenarios we examined, the method consistently outperforms imputation using a single reference panel and achieves accuracy comparable to imputation using a combined reference panel.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Genoma , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...